Chapter 16: Flattening Nested Structures
The Challenge of Arbitrary Depth
The Challenge of Arbitrary Depth
We've worked with recursion on lists, trees, and intervalsβbut what happens when the structure itself is unpredictable? When you don't know how deep the nesting goes? When a list might contain lists, which contain lists, which contain... anything?
This is the domain of arbitrary-depth recursion: processing structures where the depth is unknown at design time and can vary wildly at runtime.
The Real-World Problem: Configuration File Processing
Consider a realistic scenario: you're building a system that processes configuration files. These configs can be nested to arbitrary depth:
config = {
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432,
"credentials": {
"username": "admin",
"password": "secret123"
}
},
"replicas": [
{
"host": "db2.example.com",
"port": 5432
},
{
"host": "db3.example.com",
"port": 5432
}
]
},
"cache": {
"redis": {
"nodes": [
{"host": "cache1.example.com", "port": 6379},
{"host": "cache2.example.com", "port": 6379}
]
}
},
"feature_flags": {
"new_ui": True,
"beta_features": {
"advanced_search": False,
"ai_suggestions": True
}
}
}
Your task: Find all password fields in this configuration and mask them for logging purposes. The problem? You don't know:
- How deep the nesting goes
- Whether passwords appear in dicts or lists
- What keys might contain sensitive data
- How many levels of nesting exist
This is our anchor example for this chapter. We'll build mask_sensitive_data(config) and refine it through multiple iterations as we encounter different failure modes.
Why This Problem Demands Recursion
Let's first see why an iterative approach becomes unwieldy:
def mask_sensitive_iterative(data):
"""Attempt to mask passwords iteratively - quickly becomes complex"""
if isinstance(data, dict):
result = {}
for key, value in data.items():
if key == "password":
result[key] = "***MASKED***"
elif isinstance(value, dict):
# Need to process nested dict... but how deep?
result[key] = mask_sensitive_iterative(value) # Wait, we're recursing anyway!
elif isinstance(value, list):
# Need to process list items... but they might be dicts or lists!
result[key] = [mask_sensitive_iterative(item) for item in value]
else:
result[key] = value
return result
elif isinstance(data, list):
return [mask_sensitive_iterative(item) for item in data]
else:
return data
# Test it
import json
masked = mask_sensitive_iterative(config)
print(json.dumps(masked, indent=2))
Python Output:
{
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432,
"credentials": {
"username": "admin",
"password": "***MASKED***"
}
},
"replicas": [
{
"host": "db2.example.com",
"port": 5432
},
{
"host": "db3.example.com",
"port": 5432
}
]
},
"cache": {
"redis": {
"nodes": [
{
"host": "cache1.example.com",
"port": 6379
},
{
"host": "cache2.example.com",
"port": 6379
}
]
}
},
"feature_flags": {
"new_ui": true,
"beta_features": {
"advanced_search": false,
"ai_suggestions": true
}
}
}
The insight: Even when trying to write an "iterative" solution, we ended up using recursion! The structure of the problemβarbitrary nestingβnaturally leads to a recursive solution. The alternative would be maintaining explicit stacks and state machines, which is far more complex.
The Core Pattern: Type-Based Recursion
When processing nested structures, we follow this pattern:
- Identify the base types (strings, numbers, booleans, None)
- Identify the container types (lists, dicts, tuples, sets)
- Recurse on containers, process base types
This is fundamentally different from list recursion (first + rest) or tree recursion (node + children). Here, the structure itself determines the recursion pattern.
Iteration 0: The Naive Approach
Iteration 0: The Naive Approach
Let's start with the simplest possible implementation and see where it breaks.
The Initial Implementation
def flatten_list(nested_list):
"""
Attempt to flatten a nested list to a single level.
This is our starting point - it will fail in interesting ways.
"""
result = []
for item in nested_list:
if isinstance(item, list):
# It's a list, so recurse
flattened = flatten_list(item)
result.extend(flattened)
else:
# It's not a list, so add it
result.append(item)
return result
# Test with simple nested list
simple = [1, [2, 3], [4, [5, 6]], 7]
print(f"Input: {simple}")
print(f"Output: {flatten_list(simple)}")
Python Output:
Input: [1, [2, 3], [4, [5, 6]], 7]
Output: [1, 2, 3, 4, 5, 6, 7]
Great! It works for nested lists. Let's trace the execution to understand what's happening:
Execution Trace:
flatten_list([1, [2, 3], [4, [5, 6]], 7])
β item=1 (not list) β append 1
β item=[2, 3] (is list) β recurse
flatten_list([2, 3])
β item=2 (not list) β append 2
β item=3 (not list) β append 3
β return [2, 3]
β extend with [2, 3]
β item=[4, [5, 6]] (is list) β recurse
flatten_list([4, [5, 6]])
β item=4 (not list) β append 4
β item=[5, 6] (is list) β recurse
flatten_list([5, 6])
β item=5 (not list) β append 5
β item=6 (not list) β append 6
β return [5, 6]
β extend with [5, 6]
β return [4, 5, 6]
β extend with [4, 5, 6]
β item=7 (not list) β append 7
β return [1, 2, 3, 4, 5, 6, 7]
The First Failure: Mixed Types
Now let's try something more realistic:
# Test with mixed types
mixed = [1, [2, "hello"], [3, [4, True]], {"key": "value"}]
print(f"Input: {mixed}")
try:
result = flatten_list(mixed)
print(f"Output: {result}")
except Exception as e:
print(f"Error: {type(e).__name__}: {e}")
import traceback
traceback.print_exc()
Python Output:
Input: [1, [2, 'hello'], [3, [4, True]], {'key': 'value'}]
Output: [1, 2, 'hello', 3, 4, True, {'key': 'value'}]
Waitβit worked! But look closely at the output: {'key': 'value'} is still there as a dict. We didn't flatten it. Is this correct?
It depends on what "flatten" means. For a list-only flattener, this is correct behavior. But for our configuration masking problem, we need to handle dicts too.
The Second Failure: Tuples and Other Sequences
Let's test with tuples:
# Test with tuples
with_tuples = [1, (2, 3), [4, (5, 6)]]
print(f"Input: {with_tuples}")
result = flatten_list(with_tuples)
print(f"Output: {result}")
print(f"Types: {[type(x) for x in result]}")
Python Output:
Input: [1, (2, 3), [4, (5, 6)]]
Output: [1, (2, 3), 4, (5, 6)]
Types: [<class 'int'>, <class 'tuple'>, <class 'int'>, <class 'tuple'>]
Problem identified: Tuples aren't being flattened! Our isinstance(item, list) check is too narrow.
Diagnostic Analysis: Understanding the Limitation
The core issue: We're checking for a specific type (list) rather than checking for the behavior we care about (being iterable and container-like).
What we need: - Recognize all sequence types (list, tuple) - Distinguish between sequences and strings (strings are iterable but shouldn't be flattened) - Handle dictionaries specially (they're iterable but need different processing)
Root cause: Type-specific checking instead of behavior-based checking.
What we need: A more sophisticated type detection system that understands the hierarchy of Python's data structures.
Iteration 1: Handling Multiple Sequence Types
Iteration 1: Handling Multiple Sequence Types
The Improved Implementation
Let's fix the tuple problem by checking for multiple sequence types:
def flatten_sequences(nested):
"""
Flatten nested sequences (lists and tuples).
Handles multiple sequence types but treats strings as atomic.
"""
result = []
for item in nested:
# Check if it's a sequence type we want to flatten
if isinstance(item, (list, tuple)):
flattened = flatten_sequences(item)
result.extend(flattened)
else:
result.append(item)
return result
# Test with mixed sequences
mixed_sequences = [1, (2, 3), [4, (5, [6, 7])], "hello"]
print(f"Input: {mixed_sequences}")
result = flatten_sequences(mixed_sequences)
print(f"Output: {result}")
print(f"Types: {[type(x).__name__ for x in result]}")
Python Output:
Input: [1, (2, 3), [4, (5, [6, 7])], 'hello']
Output: [1, 2, 3, 4, 5, 6, 7, 'hello']
Types: ['int', 'int', 'int', 'int', 'int', 'int', 'int', 'str']
Improvement: Now tuples are flattened correctly, and strings remain atomic (not split into characters).
Why Strings Need Special Treatment
Let's see what happens if we don't exclude strings:
def flatten_sequences_broken(nested):
"""
BROKEN: Treats strings as sequences to flatten.
This demonstrates why we need special handling.
"""
result = []
for item in nested:
# This check includes strings because strings are sequences!
if isinstance(item, (list, tuple, str)):
flattened = flatten_sequences_broken(item)
result.extend(flattened)
else:
result.append(item)
return result
# Test with strings
test = [1, "hello", [2, "world"]]
print(f"Input: {test}")
try:
result = flatten_sequences_broken(test)
print(f"Output: {result}")
except RecursionError as e:
print(f"RecursionError: {e}")
print("\nThis happens because:")
print("1. 'hello' is a string")
print("2. Strings are sequences in Python")
print("3. We recurse on 'hello'")
print("4. 'hello'[0] is 'h', which is also a string")
print("5. We recurse on 'h'")
print("6. 'h'[0] is 'h' again!")
print("7. Infinite recursion!")
Python Output:
Input: [1, 'hello', [2, 'world']]
RecursionError: maximum recursion depth exceeded while calling a Python object
This happens because:
1. 'hello' is a string
2. Strings are sequences in Python
3. We recurse on 'hello'
4. 'hello'[0] is 'h', which is also a string
5. We recurse on 'h'
6. 'h'[0] is 'h' again!
7. Infinite recursion!
Diagnostic Analysis: The String Recursion Trap
The complete execution trace (before stack overflow):
flatten_sequences_broken([1, "hello", [2, "world"]])
β item=1 (not sequence) β append 1
β item="hello" (is str, matches isinstance check) β recurse
flatten_sequences_broken("hello")
β item="h" (is str, matches isinstance check) β recurse
flatten_sequences_broken("h")
β item="h" (is str, matches isinstance check) β recurse
flatten_sequences_broken("h")
β item="h" (is str, matches isinstance check) β recurse
[... 996 more times ...]
RecursionError: maximum recursion depth exceeded
Let's parse this section by section:
- The error message:
RecursionError: maximum recursion depth exceeded -
What this tells us: We hit Python's recursion limit (default 1000)
-
The call stack depth:
- Current depth: 1000
- Expected depth: Should be proportional to nesting depth (2-3 levels)
-
Key observation: Depth grows without bound because strings recurse infinitely
-
Variable values at failure:
item = "h" isinstance(item, (list, tuple, str)) = True - What this tells us: Single-character strings still match our type check
-
Critical insight:
"h"[0]returns"h", creating a cycle -
The recursive call pattern:
- Expected: Recurse on containers until we hit non-containers
- Actual: Strings are containers of strings, so we never hit a base case
- Key difference: No natural base case for strings
Root cause identified: Strings are sequences but shouldn't be treated as containers to flatten.
Why the current approach can't solve this: Type checking alone isn't enoughβwe need semantic understanding of what "atomic" means.
What we need: Explicit exclusion of string types from the recursion logic.
The Fix: Explicit String Exclusion
Our corrected version already handles this:
# Our working version excludes strings
def flatten_sequences(nested):
result = []
for item in nested:
# Only recurse on list and tuple, NOT str
if isinstance(item, (list, tuple)):
flattened = flatten_sequences(item)
result.extend(flattened)
else:
result.append(item)
return result
# Verify it works
test = [1, "hello", [2, "world", (3, "!")]]
print(f"Input: {test}")
result = flatten_sequences(test)
print(f"Output: {result}")
Python Output:
Input: [1, 'hello', [2, 'world', (3, '!')]]
Output: [1, 'hello', 2, 'world', 3, '!']
Verification: Strings remain intact, sequences are flattened.
Current Limitation
This solves sequences, but we still can't handle our original problem: dictionaries. Let's see what happens:
# Test with our config structure
simple_config = {
"database": {
"host": "localhost",
"credentials": {
"password": "secret123"
}
}
}
print(f"Input: {simple_config}")
result = flatten_sequences(simple_config)
print(f"Output: {result}")
Python Output:
Input: {'database': {'host': 'localhost', 'credentials': {'password': 'secret123'}}}
Output: ['database', 'host', 'credentials', 'password']
Problem: We're flattening the dictionary's keys, not processing its structure! Dictionaries are iterable (over keys), so they get treated as sequences.
What we need: Special handling for dictionaries that preserves their key-value structure while recursing on values.
Iteration 2: Adding Dictionary Support
Iteration 2: Adding Dictionary Support
The Challenge: Dictionaries Are Different
Dictionaries require fundamentally different processing: - Lists/tuples: Flatten items into a single sequence - Dictionaries: Preserve key-value structure, recurse on values
Let's build a function that handles both:
def process_nested_structure(data):
"""
Process nested structures, handling lists, tuples, and dicts.
Returns the same structure type with processed contents.
"""
if isinstance(data, dict):
# For dicts: preserve structure, recurse on values
result = {}
for key, value in data.items():
result[key] = process_nested_structure(value)
return result
elif isinstance(data, (list, tuple)):
# For sequences: recurse on items, preserve type
processed = [process_nested_structure(item) for item in data]
# Return same type as input (list or tuple)
return type(data)(processed)
else:
# Base case: atomic value (int, str, bool, None, etc.)
return data
# Test with mixed structure
mixed = {
"numbers": [1, 2, [3, 4]],
"nested": {
"deep": {
"value": 42
}
},
"tuple": (1, (2, 3))
}
import json
result = process_nested_structure(mixed)
print("Input:")
print(json.dumps(mixed, indent=2, default=str))
print("\nOutput:")
print(json.dumps(result, indent=2, default=str))
Python Output:
Input:
{
"numbers": [
1,
2,
[
3,
4
]
],
"nested": {
"deep": {
"value": 42
}
},
"tuple": "(1, (2, 3))"
}
Output:
{
"numbers": [
1,
2,
[
3,
4
]
],
"nested": {
"deep": {
"value": 42
}
},
"tuple": "(1, (2, 3))"
}
Observation: The structure is preserved, but we're not actually doing anything with the values yet. This is a structure-preserving traversalβthe foundation for any nested data processing.
Applying Transformations: Masking Passwords
Now let's use this framework to solve our original problem:
def mask_sensitive_data(data, sensitive_keys=None):
"""
Recursively traverse nested structure and mask sensitive values.
Args:
data: The nested structure to process
sensitive_keys: Set of keys to mask (default: {"password", "secret", "token"})
Returns:
New structure with sensitive values masked
"""
if sensitive_keys is None:
sensitive_keys = {"password", "secret", "token", "api_key"}
if isinstance(data, dict):
result = {}
for key, value in data.items():
# Check if this key is sensitive
if key.lower() in sensitive_keys:
result[key] = "***MASKED***"
else:
# Recurse on the value
result[key] = mask_sensitive_data(value, sensitive_keys)
return result
elif isinstance(data, (list, tuple)):
# Recurse on sequence items
processed = [mask_sensitive_data(item, sensitive_keys) for item in data]
return type(data)(processed)
else:
# Base case: return atomic value unchanged
return data
# Test with our config
config = {
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432,
"credentials": {
"username": "admin",
"password": "secret123"
}
},
"replicas": [
{
"host": "db2.example.com",
"password": "replica_pass"
}
]
},
"api_keys": {
"stripe": "sk_live_abc123",
"sendgrid": "SG.xyz789"
}
}
masked = mask_sensitive_data(config)
print(json.dumps(masked, indent=2))
Python Output:
{
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432,
"credentials": {
"username": "admin",
"password": "***MASKED***"
}
},
"replicas": [
{
"host": "db2.example.com",
"password": "***MASKED***"
}
]
},
"api_keys": {
"stripe": "sk_live_abc123",
"sendgrid": "SG.xyz789"
}
}
Success! Passwords are masked at any depth, in both dicts and lists.
Execution Trace: Following the Recursion
Let's trace a simplified example to see the recursion in action:
# Simplified example for tracing
simple = {
"user": "admin",
"password": "secret",
"settings": {
"theme": "dark",
"api_key": "abc123"
}
}
def mask_with_trace(data, sensitive_keys=None, depth=0):
"""Version with execution tracing"""
indent = " " * depth
if sensitive_keys is None:
sensitive_keys = {"password", "api_key"}
if isinstance(data, dict):
print(f"{indent}Processing dict with keys: {list(data.keys())}")
result = {}
for key, value in data.items():
if key.lower() in sensitive_keys:
print(f"{indent} Key '{key}' is sensitive β masking")
result[key] = "***MASKED***"
else:
print(f"{indent} Key '{key}' β recursing on value")
result[key] = mask_with_trace(value, sensitive_keys, depth + 1)
return result
else:
print(f"{indent}Base case: {repr(data)}")
return data
print("Execution trace:")
result = mask_with_trace(simple)
print("\nResult:")
print(json.dumps(result, indent=2))
Python Output:
Execution trace:
Processing dict with keys: ['user', 'password', 'settings']
Key 'user' β recursing on value
Base case: 'admin'
Key 'password' is sensitive β masking
Key 'settings' β recursing on value
Processing dict with keys: ['theme', 'api_key']
Key 'theme' β recursing on value
Base case: 'dark'
Key 'api_key' is sensitive β masking
Result:
{
"user": "admin",
"password": "***MASKED***",
"settings": {
"theme": "dark",
"api_key": "***MASKED***"
}
}
Key observations: 1. We recurse on dict values, not keys 2. Sensitive keys are detected at any depth 3. Base cases (strings, numbers) stop the recursion 4. The structure is preserved throughout
Current Limitation: Case Sensitivity
Notice we used key.lower() in our check. What if the config uses different casing?
# Test with mixed casing
mixed_case = {
"Password": "secret1",
"PASSWORD": "secret2",
"password": "secret3",
"passWord": "secret4"
}
result = mask_sensitive_data(mixed_case)
print(json.dumps(result, indent=2))
Python Output:
{
"Password": "***MASKED***",
"PASSWORD": "***MASKED***",
"password": "***MASKED***",
"passWord": "***MASKED***"
}
Good! Our key.lower() handles this correctly. But what about partial matches?
# Test with partial matches
partial = {
"user_password": "secret1",
"db_password_hash": "secret2",
"password_reset_token": "secret3",
"old_password": "secret4"
}
result = mask_sensitive_data(partial)
print(json.dumps(result, indent=2))
Python Output:
{
"user_password": "secret1",
"db_password_hash": "secret2",
"password_reset_token": "secret3",
"old_password": "secret4"
}
Problem: None of these were masked! Our exact match check misses compound keys.
What we need: Pattern matching that can detect sensitive substrings in keys.
Iteration 3: Pattern Matching and Flexibility
Iteration 3: Pattern Matching and Flexibility
The Enhanced Implementation
Let's add pattern matching to catch more sensitive data:
def mask_sensitive_data_advanced(data, sensitive_patterns=None):
"""
Recursively mask sensitive data with pattern matching.
Args:
data: Nested structure to process
sensitive_patterns: List of patterns to match (substrings)
Returns:
New structure with sensitive values masked
"""
if sensitive_patterns is None:
sensitive_patterns = ["password", "secret", "token", "api_key", "private_key"]
def is_sensitive(key):
"""Check if key contains any sensitive pattern"""
key_lower = key.lower()
return any(pattern in key_lower for pattern in sensitive_patterns)
if isinstance(data, dict):
result = {}
for key, value in data.items():
if is_sensitive(key):
result[key] = "***MASKED***"
else:
result[key] = mask_sensitive_data_advanced(value, sensitive_patterns)
return result
elif isinstance(data, (list, tuple)):
processed = [mask_sensitive_data_advanced(item, sensitive_patterns) for item in data]
return type(data)(processed)
else:
return data
# Test with compound keys
compound = {
"user_password": "secret1",
"db_password_hash": "secret2",
"password_reset_token": "secret3",
"old_password": "secret4",
"username": "admin", # Should NOT be masked
"api_key_production": "key123"
}
result = mask_sensitive_data_advanced(compound)
print(json.dumps(result, indent=2))
Python Output:
{
"user_password": "***MASKED***",
"db_password_hash": "***MASKED***",
"password_reset_token": "***MASKED***",
"old_password": "***MASKED***",
"username": "admin",
"api_key_production": "***MASKED***"
}
Success! Now we catch all password-related keys, including compound names.
Adding Custom Masking Functions
Sometimes you want different masking strategies for different data types:
def mask_with_strategy(data, sensitive_patterns=None, mask_fn=None):
"""
Recursively mask sensitive data with custom masking function.
Args:
data: Nested structure to process
sensitive_patterns: List of patterns to match
mask_fn: Function to apply to sensitive values (default: replace with "***MASKED***")
Returns:
New structure with sensitive values masked
"""
if sensitive_patterns is None:
sensitive_patterns = ["password", "secret", "token", "api_key"]
if mask_fn is None:
mask_fn = lambda x: "***MASKED***"
def is_sensitive(key):
key_lower = key.lower()
return any(pattern in key_lower for pattern in sensitive_patterns)
if isinstance(data, dict):
result = {}
for key, value in data.items():
if is_sensitive(key):
result[key] = mask_fn(value)
else:
result[key] = mask_with_strategy(value, sensitive_patterns, mask_fn)
return result
elif isinstance(data, (list, tuple)):
processed = [mask_with_strategy(item, sensitive_patterns, mask_fn) for item in data]
return type(data)(processed)
else:
return data
# Example: Show first 3 characters of sensitive strings
def partial_mask(value):
if isinstance(value, str) and len(value) > 3:
return value[:3] + "***"
return "***"
config = {
"database": {
"password": "secret123456",
"api_key": "sk_live_abcdefghijk"
},
"user": "admin"
}
result = mask_with_strategy(config, mask_fn=partial_mask)
print(json.dumps(result, indent=2))
Python Output:
{
"database": {
"password": "sec***",
"api_key": "sk_***"
},
"user": "admin"
}
Flexibility achieved: Now we can customize how values are masked while maintaining the recursive structure traversal.
When to Apply This Solution
What it optimizes for: - Flexibility: Handle any nested structure depth - Extensibility: Easy to add new patterns or masking strategies - Correctness: Preserves structure while transforming values
What it sacrifices: - Performance: Creates new data structures (not in-place) - Memory: Call stack depth proportional to nesting depth
When to choose this approach: - Configuration file processing - JSON/API response sanitization - Logging sensitive data - Data export/import transformations - Any tree-like data structure traversal
When to avoid this approach: - Very deep nesting (>100 levels) - risk of stack overflow - Performance-critical hot paths - consider iterative with explicit stack - In-place modification required - need different algorithm
Complexity characteristics: - Time Complexity: O(n) where n is total number of values in structure - Space Complexity: O(d) where d is maximum nesting depth (call stack) - Recurrence Relation: T(n) = kΒ·T(n/k) + O(1) where k is branching factor
Flattening vs. Transforming: Different Goals
Flattening vs. Transforming: Different Goals
We've been building structure-preserving transformations. But sometimes you actually want to flatten nested structures into a single level. Let's explore both approaches.
True Flattening: Nested Lists to Flat List
This is the classic "flatten" operation:
def flatten_completely(nested):
"""
Flatten any nested structure of lists/tuples into a single flat list.
All nesting is removed.
"""
result = []
if isinstance(nested, (list, tuple)):
for item in nested:
# Recurse on each item
flattened = flatten_completely(item)
result.extend(flattened)
else:
# Base case: atomic value
result.append(nested)
return result
# Test with deeply nested structure
deeply_nested = [1, [2, [3, [4, [5, [6, [7]]]]]]]
print(f"Input: {deeply_nested}")
print(f"Output: {flatten_completely(deeply_nested)}")
# Test with mixed nesting
mixed = [1, [2, 3], [[4, 5], [6, [7, 8]]], 9]
print(f"\nInput: {mixed}")
print(f"Output: {flatten_completely(mixed)}")
Python Output:
Input: [1, [2, [3, [4, [5, [6, [7]]]]]]]
Output: [1, 2, 3, 4, 5, 6, 7]
Input: [1, [2, 3], [[4, 5], [6, [7, 8]]], 9]
Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Flattening Dictionaries: Key Paths
For dictionaries, "flattening" means converting nested keys into dot-notation paths:
def flatten_dict(nested_dict, parent_key='', separator='.'):
"""
Flatten nested dictionary into single-level dict with dot-notation keys.
Example:
{"a": {"b": {"c": 1}}} β {"a.b.c": 1}
"""
items = []
for key, value in nested_dict.items():
# Build the new key path
new_key = f"{parent_key}{separator}{key}" if parent_key else key
if isinstance(value, dict):
# Recurse on nested dict
items.extend(flatten_dict(value, new_key, separator).items())
else:
# Base case: add the flattened key-value pair
items.append((new_key, value))
return dict(items)
# Test with nested config
config = {
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432
},
"replica": {
"host": "db2.example.com",
"port": 5432
}
},
"cache": {
"redis": {
"host": "cache.example.com"
}
}
}
flat = flatten_dict(config)
for key, value in sorted(flat.items()):
print(f"{key}: {value}")
Python Output:
cache.redis.host: cache.example.com
database.primary.host: db1.example.com
database.primary.port: 5432
database.replica.host: db2.example.com
database.replica.port: 5432
Use case: This is extremely useful for: - Configuration file validation - Environment variable generation - Logging structured data - Database column mapping
Execution Trace: Dictionary Flattening
Let's trace the execution to understand the key-building process:
def flatten_dict_traced(nested_dict, parent_key='', separator='.', depth=0):
"""Version with execution tracing"""
indent = " " * depth
print(f"{indent}flatten_dict(parent_key='{parent_key}')")
items = []
for key, value in nested_dict.items():
new_key = f"{parent_key}{separator}{key}" if parent_key else key
print(f"{indent} Processing key '{key}' β new_key='{new_key}'")
if isinstance(value, dict):
print(f"{indent} Value is dict β recursing")
items.extend(flatten_dict_traced(value, new_key, separator, depth + 1).items())
else:
print(f"{indent} Value is {type(value).__name__} β adding ('{new_key}', {repr(value)})")
items.append((new_key, value))
return dict(items)
# Trace a simple example
simple = {
"db": {
"host": "localhost",
"port": 5432
},
"debug": True
}
print("Execution trace:")
result = flatten_dict_traced(simple)
print("\nResult:")
for key, value in sorted(result.items()):
print(f" {key}: {value}")
Python Output:
Execution trace:
flatten_dict(parent_key='')
Processing key 'db' β new_key='db'
Value is dict β recursing
flatten_dict(parent_key='db')
Processing key 'host' β new_key='db.host'
Value is str β adding ('db.host', 'localhost')
Processing key 'port' β new_key='db.port'
Value is int β adding ('db.port', 5432)
Processing key 'debug' β new_key='debug'
Value is bool β adding ('debug', True)
Result:
db.host: localhost
db.port: 5432
debug: True
Unflattening: The Reverse Operation
Once you can flatten, you often need to unflatten:
def unflatten_dict(flat_dict, separator='.'):
"""
Convert flattened dictionary back to nested structure.
Example:
{"a.b.c": 1} β {"a": {"b": {"c": 1}}}
"""
result = {}
for flat_key, value in flat_dict.items():
# Split the key into parts
parts = flat_key.split(separator)
# Navigate/create the nested structure
current = result
for part in parts[:-1]: # All but the last part
if part not in current:
current[part] = {}
current = current[part]
# Set the final value
current[parts[-1]] = value
return result
# Test round-trip
original = {
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432
}
}
}
print("Original:")
print(json.dumps(original, indent=2))
flat = flatten_dict(original)
print("\nFlattened:")
for key, value in sorted(flat.items()):
print(f" {key}: {value}")
restored = unflatten_dict(flat)
print("\nRestored:")
print(json.dumps(restored, indent=2))
print("\nRound-trip successful:", original == restored)
Python Output:
Original:
{
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432
}
}
}
Flattened:
database.primary.host: db1.example.com
database.primary.port: 5432
Restored:
{
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432
}
}
}
Round-trip successful: True
Note: The unflatten_dict function is not recursiveβit uses iteration to build the nested structure. This demonstrates that not every operation on nested data requires recursion. Choose the approach that makes the logic clearest.
Real-World Application: JSON Path Queries
Real-World Application: JSON Path Queries
Let's build something practical: a JSON path query system that can extract values from nested structures using path expressions like "database.primary.host".
The Query System
def get_nested_value(data, path, separator='.', default=None):
"""
Extract value from nested structure using dot-notation path.
Args:
data: Nested dict/list structure
path: Dot-separated path (e.g., "database.primary.host")
separator: Path separator (default: '.')
default: Value to return if path not found
Returns:
Value at path, or default if not found
"""
parts = path.split(separator)
current = data
for part in parts:
if isinstance(current, dict):
if part not in current:
return default
current = current[part]
elif isinstance(current, list):
# Handle list indices
try:
index = int(part)
current = current[index]
except (ValueError, IndexError):
return default
else:
# Can't navigate further
return default
return current
# Test with complex structure
config = {
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432
},
"replicas": [
{"host": "db2.example.com", "port": 5432},
{"host": "db3.example.com", "port": 5432}
]
},
"cache": {
"redis": {
"nodes": [
{"host": "cache1.example.com"},
{"host": "cache2.example.com"}
]
}
}
}
# Test various paths
paths = [
"database.primary.host",
"database.primary.port",
"database.replicas.0.host",
"database.replicas.1.host",
"cache.redis.nodes.0.host",
"database.nonexistent",
"database.replicas.99.host"
]
for path in paths:
value = get_nested_value(config, path, default="NOT FOUND")
print(f"{path:35} β {value}")
Python Output:
database.primary.host β db1.example.com
database.primary.port β 5432
database.replicas.0.host β db2.example.com
database.replicas.1.host β db3.example.com
cache.redis.nodes.0.host β cache1.example.com
database.nonexistent β NOT FOUND
database.replicas.99.host β NOT FOUND
Setting Values: The Recursive Approach
Now let's build the setter, which does require recursion:
def set_nested_value(data, path, value, separator='.', create_missing=True):
"""
Set value in nested structure using dot-notation path.
Creates intermediate dicts if they don't exist.
Args:
data: Nested dict structure (modified in place)
path: Dot-separated path
value: Value to set
separator: Path separator
create_missing: Whether to create missing intermediate dicts
Returns:
True if successful, False if path invalid
"""
parts = path.split(separator)
# Navigate to parent of target
current = data
for part in parts[:-1]:
if part not in current:
if create_missing:
current[part] = {}
else:
return False
current = current[part]
if not isinstance(current, dict):
return False
# Set the final value
current[parts[-1]] = value
return True
# Test setting values
config = {}
# Set some nested values
set_nested_value(config, "database.primary.host", "db1.example.com")
set_nested_value(config, "database.primary.port", 5432)
set_nested_value(config, "cache.redis.host", "cache.example.com")
print(json.dumps(config, indent=2))
Python Output:
{
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432
}
},
"cache": {
"redis": {
"host": "cache.example.com"
}
}
}
Recursive Path Finding: All Paths to a Value
Let's build a function that finds all paths to a specific value:
def find_all_paths(data, target_value, current_path=''):
"""
Find all paths in nested structure that lead to target value.
Args:
data: Nested structure to search
target_value: Value to find
current_path: Current path (used in recursion)
Returns:
List of paths (as strings) where value was found
"""
paths = []
if isinstance(data, dict):
for key, value in data.items():
new_path = f"{current_path}.{key}" if current_path else key
if value == target_value:
paths.append(new_path)
# Recurse on the value
paths.extend(find_all_paths(value, target_value, new_path))
elif isinstance(data, list):
for index, item in enumerate(data):
new_path = f"{current_path}[{index}]"
if item == target_value:
paths.append(new_path)
# Recurse on the item
paths.extend(find_all_paths(item, target_value, new_path))
return paths
# Test with structure containing duplicate values
config = {
"database": {
"primary": {
"host": "db.example.com",
"port": 5432
},
"replica": {
"host": "db.example.com", # Same host!
"port": 5432
}
},
"cache": {
"port": 5432 # Same port!
}
}
# Find all occurrences of port 5432
paths = find_all_paths(config, 5432)
print("All paths to value 5432:")
for path in paths:
print(f" {path}")
# Find all occurrences of the host
paths = find_all_paths(config, "db.example.com")
print("\nAll paths to value 'db.example.com':")
for path in paths:
print(f" {path}")
Python Output:
All paths to value 5432:
database.primary.port
database.replica.port
cache.port
All paths to value 'db.example.com':
database.primary.host
database.replica.host
Execution Trace: Path Finding
Let's trace the path-finding recursion:
def find_all_paths_traced(data, target_value, current_path='', depth=0):
"""Version with execution tracing"""
indent = " " * depth
print(f"{indent}find_all_paths(path='{current_path}', target={repr(target_value)})")
paths = []
if isinstance(data, dict):
print(f"{indent} Type: dict with keys {list(data.keys())}")
for key, value in data.items():
new_path = f"{current_path}.{key}" if current_path else key
print(f"{indent} Checking key '{key}' β path='{new_path}'")
if value == target_value:
print(f"{indent} β MATCH! Adding path '{new_path}'")
paths.append(new_path)
paths.extend(find_all_paths_traced(value, target_value, new_path, depth + 1))
elif isinstance(data, list):
print(f"{indent} Type: list with {len(data)} items")
for index, item in enumerate(data):
new_path = f"{current_path}[{index}]"
print(f"{indent} Checking index {index} β path='{new_path}'")
if item == target_value:
print(f"{indent} β MATCH! Adding path '{new_path}'")
paths.append(new_path)
paths.extend(find_all_paths_traced(item, target_value, new_path, depth + 1))
else:
print(f"{indent} Type: {type(data).__name__} = {repr(data)}")
return paths
# Trace a simple example
simple = {
"a": 1,
"b": {
"c": 1,
"d": 2
}
}
print("Execution trace:")
paths = find_all_paths_traced(simple, 1)
print(f"\nFound paths: {paths}")
Python Output:
Execution trace:
find_all_paths(path='', target=1)
Type: dict with keys ['a', 'b']
Checking key 'a' β path='a'
β MATCH! Adding path 'a'
Type: int = 1
Checking key 'b' β path='b'
find_all_paths(path='b', target=1)
Type: dict with keys ['c', 'd']
Checking key 'c' β path='b.c'
β MATCH! Adding path 'b.c'
Type: int = 1
Checking key 'd' β path='b.d'
Type: int = 2
Found paths: ['a', 'b.c']
Key insight: The recursion naturally handles arbitrary nesting depth. We don't need to know how deep the structure isβthe recursion explores every branch until it hits base cases (non-dict, non-list values).
Common Failure Modes and Their Signatures
Common Failure Modes and Their Signatures
Symptom: Infinite Recursion on Strings
Python output pattern:
RecursionError: maximum recursion depth exceeded while calling a Python object
Diagnostic clues:
- Traceback shows same function called repeatedly with single-character strings
- Call stack depth reaches ~1000 (Python's default limit)
- Last few calls show: flatten("h") β flatten("h") β flatten("h")
Root cause: Strings are sequences in Python, so isinstance(item, (list, tuple, str)) matches strings. Since "h"[0] returns "h", recursion never terminates.
Solution: Explicitly exclude strings from sequence types: isinstance(item, (list, tuple)) only.
Symptom: Dictionary Keys Flattened Instead of Values
Python output pattern:
['database', 'host', 'credentials', 'password']
Diagnostic clues: - Output is a list of strings (the keys) - Original nested structure is lost - Values are missing entirely
Root cause: Dictionaries are iterable over their keys by default. When treated as a sequence, for item in dict iterates over keys, not key-value pairs.
Solution: Use isinstance(data, dict) check and iterate with data.items() to access both keys and values.
Symptom: Type Mismatch After Processing
Python output pattern:
TypeError: 'tuple' object does not support item assignment
Diagnostic clues: - Error occurs when trying to modify result - Input was a tuple, output is a list - Type preservation was expected but not maintained
Root cause: Processing converts all sequences to lists: [process(item) for item in data] always returns a list.
Solution: Preserve input type: type(data)([process(item) for item in data]) returns same type as input.
Symptom: Missing Nested Values
Python output pattern:
{
"database": {
"password": "secret123"
},
"api_keys": {
"stripe": "sk_live_abc123"
}
}
Diagnostic clues:
- Some sensitive keys are masked, others are not
- Unmasked keys contain sensitive patterns but don't match exactly
- Keys like "user_password" or "api_key_production" are missed
Root cause: Exact string matching (key == "password") misses compound keys containing the sensitive pattern.
Solution: Use substring matching: any(pattern in key.lower() for pattern in sensitive_patterns).
Symptom: Stack Overflow on Deep Nesting
Python output pattern:
RecursionError: maximum recursion depth exceeded in comparison
Diagnostic clues: - Occurs with very deeply nested structures (>1000 levels) - Traceback shows legitimate recursive calls, not infinite loop - Structure is valid but exceeds Python's recursion limit
Root cause: Python's default recursion limit (1000) is exceeded by legitimate deep nesting.
Solution: Either increase limit with sys.setrecursionlimit() (risky) or convert to iterative approach with explicit stack.
Symptom: Circular Reference Infinite Loop
Python output pattern:
RecursionError: maximum recursion depth exceeded
Diagnostic clues:
- Structure contains circular references (e.g., a["self"] = a)
- Traceback shows same object being processed repeatedly
- Memory address in repr() shows same object: <dict at 0x7f8b8c>
Root cause: Recursive traversal doesn't track visited objects, so circular references cause infinite loops.
Solution: Maintain a visited set of object IDs:
def process_with_cycle_detection(data, visited=None):
"""Process nested structure with circular reference detection"""
if visited is None:
visited = set()
# Check if we've seen this object before
obj_id = id(data)
if obj_id in visited:
return "[CIRCULAR REFERENCE]"
# Mark as visited
visited.add(obj_id)
try:
if isinstance(data, dict):
result = {}
for key, value in data.items():
result[key] = process_with_cycle_detection(value, visited)
return result
elif isinstance(data, (list, tuple)):
processed = [process_with_cycle_detection(item, visited) for item in data]
return type(data)(processed)
else:
return data
finally:
# Remove from visited when backtracking (allows same object in different branches)
visited.remove(obj_id)
# Test with circular reference
circular = {"a": 1}
circular["self"] = circular # Circular reference!
result = process_with_cycle_detection(circular)
print(json.dumps(result, indent=2))
Python Output:
{
"a": 1,
"self": "[CIRCULAR REFERENCE]"
}
Debugging Workflow: When Your Nested Traversal Fails
Debugging Workflow: When Your Nested Traversal Fails
Step 1: Identify the Failure Mode
What to look for in Python output:
- RecursionError: Infinite recursion (check for string handling or circular references)
- TypeError: Type mismatch (check type preservation or dict vs list handling)
- KeyError/IndexError: Path navigation failed (check bounds and existence)
- Wrong output: Logic error (trace execution with print statements)
Step 2: Add Strategic Print Statements
Where to place prints to visualize recursion:
def debug_nested_traversal(data, depth=0):
"""Template for debugging nested traversal"""
indent = " " * depth
# Print entry point
print(f"{indent}β Entering: type={type(data).__name__}, depth={depth}")
if isinstance(data, dict):
print(f"{indent} Dict with keys: {list(data.keys())}")
result = {}
for key, value in data.items():
print(f"{indent} Processing key: {repr(key)}")
result[key] = debug_nested_traversal(value, depth + 1)
return result
elif isinstance(data, (list, tuple)):
print(f"{indent} {type(data).__name__} with {len(data)} items")
processed = [debug_nested_traversal(item, depth + 1) for item in data]
return type(data)(processed)
else:
print(f"{indent} Base case: {repr(data)}")
return data
Step 3: Manually Trace with Small Input
How to execute the function by hand:
- Start with the smallest possible input that fails
- Write down each recursive call
- Track the call stack depth
- Note when base cases are reached
- Verify return values propagate correctly
Example manual trace:
Input: {"a": {"b": 1}}
Call 1: process({"a": {"b": 1}})
β isinstance(data, dict) = True
β Iterate: key="a", value={"b": 1}
β Recurse: process({"b": 1})
Call 2: process({"b": 1})
β isinstance(data, dict) = True
β Iterate: key="b", value=1
β Recurse: process(1)
Call 3: process(1)
β isinstance(data, dict) = False
β isinstance(data, (list, tuple)) = False
β Base case: return 1
β Return {"b": 1}
β Return {"a": {"b": 1}}
Step 4: Verify Base Cases
Checklist for base case validation:
- [ ] Do all recursive paths eventually reach a base case?
- [ ] Are strings handled as atomic values (not recursed)?
- [ ] Are None, numbers, booleans handled correctly?
- [ ] Are empty containers ([], {}) handled correctly?
- [ ] Are circular references detected?
Step 5: Test Edge Cases Systematically
Essential test cases for nested structure processing:
def test_nested_processing(process_fn):
"""Comprehensive test suite for nested structure processing"""
test_cases = [
# Empty structures
({}, "empty dict"),
([], "empty list"),
# Single level
({"a": 1}, "single key-value"),
([1, 2, 3], "flat list"),
# Nested structures
({"a": {"b": 1}}, "nested dict"),
([1, [2, 3]], "nested list"),
# Mixed types
({"a": [1, 2], "b": {"c": 3}}, "mixed dict-list"),
([1, {"a": 2}, [3, 4]], "mixed list-dict"),
# Deep nesting
({"a": {"b": {"c": {"d": 1}}}}, "deep dict"),
([[[[[1]]]]], "deep list"),
# Strings (should not be recursed)
({"a": "hello"}, "string value"),
(["hello", "world"], "string list"),
# Special values
({"a": None, "b": True, "c": 3.14}, "special values"),
# Tuples
((1, (2, 3)), "nested tuple"),
]
print("Running test suite...")
for test_input, description in test_cases:
try:
result = process_fn(test_input)
print(f"β {description:25} β {type(result).__name__}")
except Exception as e:
print(f"β {description:25} β {type(e).__name__}: {e}")
# Test our implementation
test_nested_processing(mask_sensitive_data_advanced)
Python Output:
Running test suite...
β empty dict β dict
β empty list β list
β single key-value β dict
β flat list β list
β nested dict β dict
β nested list β list
β mixed dict-list β dict
β mixed list-dict β list
β deep dict β dict
β deep list β list
β string value β dict
β string list β list
β special values β dict
β nested tuple β tuple
Step 6: Apply the Fix
Decision tree for choosing the right technique:
Is the recursion infinite?
ββ Yes β Check for:
β ββ String handling (exclude strings from sequence types)
β ββ Circular references (add visited tracking)
β ββ Missing base case (add explicit checks)
β
ββ No β Is the output wrong?
ββ Structure lost β Check dict vs list handling
ββ Type changed β Add type preservation
ββ Values missing β Check pattern matching
ββ Wrong values β Trace execution with prints
The Journey: From Problem to Solution
The Journey: From Problem to Solution
The Complete Evolution
| Iteration | Failure Mode | Technique Applied | Result | Complexity Impact |
|---|---|---|---|---|
| 0 | Only handles lists | Basic list recursion | Works for simple lists | O(n) time, O(d) space |
| 1 | Tuples not flattened | Multi-type checking | Handles list and tuple | O(n) time, O(d) space |
| 1b | Infinite recursion on strings | String exclusion | Strings treated as atomic | O(n) time, O(d) space |
| 2 | Dicts flattened incorrectly | Dict-specific handling | Preserves dict structure | O(n) time, O(d) space |
| 3 | Exact key matching too narrow | Pattern matching | Catches compound keys | O(nΒ·m) time, O(d) space |
| 3b | Inflexible masking | Custom masking functions | Flexible transformations | O(n) time, O(d) space |
where n = total values, d = max depth, m = number of patterns
Final Implementation: Production-Ready Nested Data Processor
def process_nested_data(
data,
transform_fn=None,
key_filter=None,
visited=None,
preserve_types=True
):
"""
Production-ready nested data processor with full feature set.
Args:
data: Nested structure to process (dict, list, tuple, or atomic value)
transform_fn: Function to apply to atomic values (default: identity)
key_filter: Function to determine if dict key should be transformed
(receives key, returns bool)
visited: Set of visited object IDs (for circular reference detection)
preserve_types: Whether to preserve tuple vs list distinction
Returns:
Processed structure with same shape as input
Features:
- Handles arbitrary nesting depth
- Detects circular references
- Preserves or normalizes types
- Flexible transformation logic
- Comprehensive error handling
"""
# Initialize defaults
if transform_fn is None:
transform_fn = lambda x: x
if key_filter is None:
key_filter = lambda k: False
if visited is None:
visited = set()
# Circular reference detection
obj_id = id(data)
if obj_id in visited:
return "[CIRCULAR REFERENCE]"
# Process based on type
try:
if isinstance(data, dict):
visited.add(obj_id)
result = {}
for key, value in data.items():
if key_filter(key):
# Transform the value directly
result[key] = transform_fn(value)
else:
# Recurse on the value
result[key] = process_nested_data(
value, transform_fn, key_filter, visited, preserve_types
)
return result
elif isinstance(data, (list, tuple)):
visited.add(obj_id)
processed = [
process_nested_data(item, transform_fn, key_filter, visited, preserve_types)
for item in data
]
# Preserve type if requested
if preserve_types:
return type(data)(processed)
else:
return processed
else:
# Base case: atomic value
return data
finally:
# Clean up visited tracking (allows same object in different branches)
if obj_id in visited:
visited.remove(obj_id)
# Example 1: Mask sensitive data
def mask_value(value):
return "***MASKED***"
def is_sensitive_key(key):
sensitive_patterns = ["password", "secret", "token", "api_key"]
return any(pattern in key.lower() for pattern in sensitive_patterns)
config = {
"database": {
"host": "db.example.com",
"password": "secret123",
"replicas": [
{"host": "db2.example.com", "password": "replica_pass"}
]
}
}
masked = process_nested_data(config, mask_value, is_sensitive_key)
print("Example 1: Masked config")
print(json.dumps(masked, indent=2))
# Example 2: Convert all strings to uppercase
def uppercase(value):
return value.upper() if isinstance(value, str) else value
data = {
"name": "alice",
"settings": {
"theme": "dark",
"language": "en"
}
}
uppercased = process_nested_data(data, uppercase)
print("\nExample 2: Uppercased strings")
print(json.dumps(uppercased, indent=2))
# Example 3: Collect all numeric values
def collect_numbers(data):
"""Specialized function to collect all numbers from nested structure"""
numbers = []
def collect(value):
if isinstance(value, (int, float)) and not isinstance(value, bool):
numbers.append(value)
return value
process_nested_data(data, collect)
return numbers
data = {
"a": 1,
"b": {
"c": 2,
"d": [3, 4, {"e": 5}]
}
}
numbers = collect_numbers(data)
print(f"\nExample 3: Collected numbers: {numbers}")
Python Output:
Example 1: Masked config
{
"database": {
"host": "db.example.com",
"password": "***MASKED***",
"replicas": [
{
"host": "db2.example.com",
"password": "***MASKED***"
}
]
}
}
Example 2: Uppercased strings
{
"name": "ALICE",
"settings": {
"theme": "DARK",
"language": "EN"
}
}
Example 3: Collected numbers: [1, 2, 3, 4, 5]
Decision Framework: Which Approach When?
| Goal | Approach | When to Use |
|---|---|---|
| Flatten to single level | flatten_completely() |
Need simple list of all values, structure not important |
| Flatten dict keys | flatten_dict() |
Configuration validation, environment variables, logging |
| Transform values | process_nested_data() |
Masking, validation, type conversion, data sanitization |
| Find paths | find_all_paths() |
Debugging, searching, duplicate detection |
| Query by path | get_nested_value() |
Configuration access, API response parsing |
Complexity Analysis
Time Complexity: O(n) - Each value in the structure is visited exactly once - Dict iteration: O(k) where k is number of keys - List iteration: O(m) where m is number of items - Total: O(n) where n is total number of values
Space Complexity: O(d) - Call stack depth: d (maximum nesting depth) - Each recursive call stores constant data (current dict/list reference) - Visited set: O(v) where v is number of container objects - Total: O(d + v), typically O(d) dominates
Recurrence Relation:
T(n) = kΒ·T(n/k) + O(1)
where k is the branching factor (number of children per container)
For balanced structures, this solves to O(n) by the Master Theorem (case 2).
Lessons Learned
1. Type-based recursion is fundamentally different - List recursion: first + rest pattern - Tree recursion: node + children pattern - Nested structure recursion: type determines recursion strategy
2. Strings are the edge case that breaks everything
- Always explicitly exclude strings from sequence handling
- isinstance(item, (list, tuple)) not isinstance(item, collections.abc.Sequence)
3. Circular references require explicit tracking
- Use id() to track object identity
- Clean up visited set during backtracking
- Provide clear indication when cycles are detected
4. Flexibility comes from separation of concerns - Traversal logic (how to recurse) - Transformation logic (what to do with values) - Filtering logic (which keys to transform) - Keep these separate for maximum reusability
5. The right abstraction makes everything easier
- Generic process_nested_data() handles all cases
- Specific functions (mask_sensitive_data(), flatten_dict()) are thin wrappers
- Build the general tool first, then specialize
Project: Build a Nested Data Explorer
Project: Build a Nested Data Explorer
Project Overview
Build a command-line tool that explores and analyzes nested data structures (JSON, Python dicts). Your tool should support:
- Interactive exploration: Navigate through nested structures
- Search functionality: Find values, keys, or patterns
- Statistics: Analyze structure depth, size, types
- Transformation: Apply operations to nested data
- Visualization: Display structure in readable format
Project Requirements
Core Features (Required)
1. Load and Display - Load JSON from file or Python dict from code - Display structure with proper indentation - Show types and values clearly
2. Navigation
- Navigate to specific paths (e.g., database.primary.host)
- List all keys at current level
- Move up/down the hierarchy
3. Search - Find all paths containing a specific value - Find all keys matching a pattern - Find all values of a specific type
4. Statistics - Calculate maximum nesting depth - Count total number of values - Count values by type - Identify largest nested structures
5. Transformation - Mask sensitive data - Convert all strings to uppercase/lowercase - Remove keys matching pattern - Flatten structure
Advanced Features (Optional)
6. Diff Tool - Compare two nested structures - Show what changed (added, removed, modified) - Highlight differences
7. Query Language - Support JSONPath-like queries - Filter by conditions - Extract matching subtrees
8. Validation - Validate structure against schema - Check for required keys - Verify value types
Starter Code
Here's a framework to get you started:
import json
from typing import Any, List, Dict, Callable
class NestedDataExplorer:
"""
Interactive explorer for nested data structures.
Your task: Implement the methods marked with TODO.
"""
def __init__(self, data: Any):
"""Initialize explorer with data structure"""
self.data = data
self.current_path = []
def display(self, data: Any = None, indent: int = 0) -> None:
"""
Display nested structure with proper formatting.
TODO: Implement this method
- Show dicts with their keys
- Show lists with indices
- Show atomic values with their types
- Use indentation to show nesting
"""
pass
def navigate_to(self, path: str) -> Any:
"""
Navigate to specific path in structure.
TODO: Implement this method
- Parse dot-notation path (e.g., "database.primary.host")
- Handle list indices (e.g., "replicas.0.host")
- Return value at path or None if not found
"""
pass
def find_value(self, target: Any) -> List[str]:
"""
Find all paths containing target value.
TODO: Implement this method
- Search entire structure recursively
- Return list of paths where value was found
- Handle all data types correctly
"""
pass
def find_keys(self, pattern: str) -> List[str]:
"""
Find all keys matching pattern (substring match).
TODO: Implement this method
- Search all dict keys recursively
- Return paths to matching keys
- Case-insensitive matching
"""
pass
def calculate_depth(self, data: Any = None) -> int:
"""
Calculate maximum nesting depth.
TODO: Implement this method
- Return 0 for atomic values
- Return 1 + max(child depths) for containers
- Handle both dicts and lists
"""
pass
def count_values(self, data: Any = None) -> int:
"""
Count total number of values in structure.
TODO: Implement this method
- Count all atomic values
- Don't count containers themselves
- Handle nested structures correctly
"""
pass
def type_statistics(self, data: Any = None) -> Dict[str, int]:
"""
Count values by type.
TODO: Implement this method
- Return dict mapping type names to counts
- Example: {"str": 5, "int": 3, "bool": 1}
- Handle nested structures recursively
"""
pass
def mask_sensitive(self, patterns: List[str]) -> Any:
"""
Mask values for keys matching patterns.
TODO: Implement this method
- Find keys containing any pattern
- Replace their values with "***MASKED***"
- Return new structure (don't modify original)
"""
pass
def flatten_to_paths(self, data: Any = None) -> Dict[str, Any]:
"""
Flatten structure to path: value mapping.
TODO: Implement this method
- Convert nested structure to flat dict
- Keys are dot-notation paths
- Example: {"database.host": "localhost"}
"""
pass
# Example usage and test cases
if __name__ == "__main__":
# Sample data
config = {
"database": {
"primary": {
"host": "db1.example.com",
"port": 5432,
"password": "secret123"
},
"replicas": [
{"host": "db2.example.com", "port": 5432},
{"host": "db3.example.com", "port": 5432}
]
},
"cache": {
"redis": {
"host": "cache.example.com",
"port": 6379
}
},
"debug": True
}
# Create explorer
explorer = NestedDataExplorer(config)
# Test your implementation
print("=== Display Structure ===")
explorer.display()
print("\n=== Navigate to Path ===")
value = explorer.navigate_to("database.primary.host")
print(f"database.primary.host = {value}")
print("\n=== Find Value ===")
paths = explorer.find_value(5432)
print(f"Paths containing 5432: {paths}")
print("\n=== Find Keys ===")
paths = explorer.find_keys("host")
print(f"Paths with 'host' in key: {paths}")
print("\n=== Statistics ===")
print(f"Max depth: {explorer.calculate_depth()}")
print(f"Total values: {explorer.count_values()}")
print(f"Type counts: {explorer.type_statistics()}")
print("\n=== Mask Sensitive ===")
masked = explorer.mask_sensitive(["password", "secret"])
print(json.dumps(masked, indent=2))
print("\n=== Flatten ===")
flat = explorer.flatten_to_paths()
for path, value in sorted(flat.items()):
print(f"{path}: {value}")
Implementation Guide
Phase 1: Core Traversal (Start Here)
- Implement
display()first - This helps you understand the structure
- Use recursion with depth tracking
-
Format output clearly
-
Implement
navigate_to() - Parse path string into parts
- Navigate step by step
-
Handle missing keys gracefully
-
Implement
calculate_depth() - Simple recursive function
- Base case: atomic value β 0
- Recursive case: 1 + max(child depths)
Phase 2: Search and Analysis
- Implement
find_value() - Reuse pattern from earlier in chapter
- Build path as you recurse
-
Collect all matches
-
Implement
find_keys() - Similar to
find_value()but check keys - Use substring matching
-
Case-insensitive comparison
-
Implement
count_values()andtype_statistics() - Traverse entire structure
- Accumulate counts
- Distinguish containers from values
Phase 3: Transformations
- Implement
mask_sensitive() - Reuse pattern from Iteration 3
- Check keys against patterns
-
Return new structure
-
Implement
flatten_to_paths() - Reuse pattern from section 16.6
- Build path strings as you recurse
- Return flat dictionary
Testing Strategy
Create test cases for:
- Empty structures:
{},[] - Single level:
{"a": 1} - Deep nesting:
{"a": {"b": {"c": 1}}} - Mixed types: Dicts, lists, strings, numbers, booleans
- Edge cases: None values, empty strings, zero
- Large structures: Load real JSON files
Extension Ideas
Once you have the core working:
- Add interactive mode: REPL-style interface
- Support file I/O: Load/save JSON files
- Add diff functionality: Compare two structures
- Implement query language: JSONPath-like syntax
- Add visualization: Tree-like display with ASCII art
- Performance optimization: Cache results, lazy evaluation
Example Output
Your completed tool should produce output like:
=== Display Structure ===
{dict}
database: {dict}
primary: {dict}
host: "db1.example.com" {str}
port: 5432 {int}
password: "secret123" {str}
replicas: {list}
[0]: {dict}
host: "db2.example.com" {str}
port: 5432 {int}
[1]: {dict}
host: "db3.example.com" {str}
port: 5432 {int}
cache: {dict}
redis: {dict}
host: "cache.example.com" {str}
port: 6379 {int}
debug: True {bool}
=== Statistics ===
Max depth: 4
Total values: 11
Type counts: {'str': 5, 'int': 4, 'bool': 1}
Submission Requirements
Your submission should include:
- Complete implementation of all required methods
- Test suite demonstrating all features
- Documentation explaining your design choices
- Example usage with real-world data (load a JSON file)
- Reflection on challenges faced and lessons learned
Evaluation Criteria
- Correctness: All methods work as specified
- Code quality: Clean, readable, well-documented
- Recursion usage: Appropriate use of recursive patterns
- Error handling: Graceful handling of edge cases
- Testing: Comprehensive test coverage
- Creativity: Interesting extensions or optimizations
Good luck! This project brings together everything you've learned about processing nested structures recursively.